{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Establish environment variables and services for Google Cloud API access</h2>\n",
    "+ In Console, go to <b> Products & services </b> > <b> APIs & services </b> > <b> Credentials </b>\n",
    "  + Click on <b> Create Credentials </b> and select <b> API Key </b>\n",
    "  + Copy the API KEy and paste it in the APIKEY field below.\n",
    "+\n",
    "+ In Console, go to <b> Products & services </b> > <b> Home </b>\n",
    "  + Select and copy the Project ID\n",
    "  + Paste the Project ID into both the PROJECT_ID field and the BUCKET field below.\n",
    "+\n",
    "+ Run the next block"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AIzaSyBQrrl4SZhE3QtxsnbjY2WTdgcBz0G0Rfs\n",
      "qwiklabs-gcp-14067121d7b1d12c\n",
      "\n",
      "Google Cloud API Client credentials established\n"
     ]
    }
   ],
   "source": [
    "APIKEY=\"AIzaSyBQrrl4SZhE3QtxsnbjY2WTdgcBz0G0Rfs\"   # CHANGE\n",
    "print APIKEY\n",
    "\n",
    "PROJECT_ID = \"qwiklabs-gcp-14067121d7b1d12c\"  # CHANGE\n",
    "print PROJECT_ID \n",
    "\n",
    "BUCKET = \"qwiklabs-gcp-14067121d7b1d12c\"   # CHANGE\n",
    "\n",
    "import os\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['PROJECT'] = PROJECT_ID\n",
    "\n",
    "from googleapiclient.discovery import build\n",
    "\n",
    "print(\"\\n\",\"Google Cloud API Client credentials established\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Define an API calling function </h2>\n",
    "+ Run the following block of code to define a language service API interface\n",
    "+ When this is called it will pass the text to the service using a JSON formatted block\n",
    "+ And it will receive a JSON response from the Google Cloud Language service\n",
    "+ The response will be automatically represented by Python as a 'dict' object (a dictionary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Sentiment Analysis function defined.\n"
     ]
    }
   ],
   "source": [
    "def SentimentAnalysis(text):\n",
    "    from googleapiclient.discovery import build\n",
    "    lservice = build('language', 'v1beta1', developerKey=APIKEY)\n",
    "\n",
    "    response = lservice.documents().analyzeSentiment(\n",
    "        body={\n",
    "            'document': {\n",
    "                'type': 'PLAIN_TEXT',\n",
    "                'content': text\n",
    "            }\n",
    "        }).execute()\n",
    "    \n",
    "    return response\n",
    "\n",
    "print(\"\\n\",\"Sentiment Analysis function defined.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Test the Sentiment Analysis </h2>\n",
    "+ Use a simple string to test the function and verify all the API elements are working."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "This is the Python object that is returned; a dictionary.\n",
      "\n",
      "\n",
      "Function returns : <type 'dict'>\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.8, u'score': 0.8}, u'language': u'en', u'sentences': [{u'text': {u'content': u'There are places I remember, all my life though some have changed.', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.8, u'score': 0.8}}]}\n",
      "\n",
      "This is the JSON formatted version of the object\n",
      "{\n",
      "    \"documentSentiment\": {\n",
      "        \"magnitude\": 0.8, \n",
      "        \"polarity\": 1, \n",
      "        \"score\": 0.8\n",
      "    }, \n",
      "    \"language\": \"en\", \n",
      "    \"sentences\": [\n",
      "        {\n",
      "            \"sentiment\": {\n",
      "                \"magnitude\": 0.8, \n",
      "                \"polarity\": 1, \n",
      "                \"score\": 0.8\n",
      "            }, \n",
      "            \"text\": {\n",
      "                \"beginOffset\": -1, \n",
      "                \"content\": \"There are places I remember, all my life though some have changed.\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "sampleline = u'There are places I remember, all my life though some have changed.'\n",
    "\n",
    "\n",
    "results = SentimentAnalysis(sampleline)\n",
    "print(\"\\n\",\"This is the Python object that is returned; a dictionary.\")\n",
    "print(\"\\n\")\n",
    "print(\"Function returns :\",type(results))\n",
    "print(results)\n",
    "\n",
    "import json\n",
    "print(\"\\n\",\"This is the JSON formatted version of the object\")\n",
    "print(json.dumps(results, sort_keys=True, indent=4))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Use the Dataproc cluster to run a Spark job that uses the Machine Learning API </h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pyspark.rdd.PipelinedRDD'>\n",
      "<type 'list'>\n",
      "\n",
      "\n",
      "{u'documentSentiment': {u'polarity': 0, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': []}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'roads diverged in a yellow wood,', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.4, u'score': -0.4}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And sorry I could not travel both', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.4, u'score': -0.4}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And be one traveler, long I stood', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And looked down one as far as I could', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.6, u'score': -0.6}, u'language': u'en', u'sentences': [{u'text': {u'content': u'To where it bent in the undergrowth;', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.6, u'score': -0.6}}]}\n",
      "{u'documentSentiment': {u'polarity': 0, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': []}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Then took the other, as just as fair,', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And having perhaps the better claim,', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.5, u'score': 0.5}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Because it was grassy and wanted wear;', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.5, u'score': 0.5}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.4, u'score': 0.4}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Though as for that the passing there', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.4, u'score': 0.4}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Had worn them really about the same,', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}}]}\n",
      "{u'documentSentiment': {u'polarity': 0, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': []}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And both that morning equally lay', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'In leaves no step had trodden black.', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.1, u'score': -0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Oh, I kept the first for another day!', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0, u'score': 0}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Yet knowing how way leads on to way,', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0, u'score': 0}}]}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': [{u'text': {u'content': u'I doubted if I should ever come back.', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0, u'score': 0}}]}\n",
      "{u'documentSentiment': {u'polarity': 0, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': []}\n",
      "{u'documentSentiment': {u'polarity': -1, u'magnitude': 0.6, u'score': -0.6}, u'language': u'en', u'sentences': [{u'text': {u'content': u'I shall be telling this with a sigh', u'beginOffset': -1}, u'sentiment': {u'polarity': -1, u'magnitude': 0.6, u'score': -0.6}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Somewhere ages and ages hence:', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'Two roads diverged in a wood, and I-', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}, u'language': u'en', u'sentences': [{u'text': {u'content': u'I took the one less traveled by,', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.1, u'score': 0.1}}]}\n",
      "{u'documentSentiment': {u'polarity': 1, u'magnitude': 0.2, u'score': 0.2}, u'language': u'en', u'sentences': [{u'text': {u'content': u'And that has made all the difference.', u'beginOffset': -1}, u'sentiment': {u'polarity': 1, u'magnitude': 0.2, u'score': 0.2}}]}\n",
      "{u'documentSentiment': {u'polarity': 0, u'magnitude': 0, u'score': 0}, u'language': u'en', u'sentences': []}\n"
     ]
    }
   ],
   "source": [
    "# Working with the smaller sample file\n",
    "#\n",
    "lines = sc.textFile(\"/sampledata/road-not-taken.txt\")\n",
    "#\n",
    "# The Spark map transformation will execute SentimentAnalysis on each element in lines and store the result in sentiment.\n",
    "# Remember that due to lazy execution, this line just queues up the transformation, it does not run yet.\n",
    "# So you will not see errors at this point.\n",
    "#\n",
    "sentiment = lines.map(SentimentAnalysis)\n",
    "#\n",
    "#\n",
    "print (type(sentiment))\n",
    "# sentiment is a pyspark.rdd.PipelinedRDD\n",
    "#\n",
    "# If it is properly formed then an action such as sentiment.collect() will run the job.\n",
    "# If not properly formed, it will throw errors.\n",
    "#\n",
    "output = sentiment.collect()\n",
    "#\n",
    "# The sentiment rdd contains JSON returns. In python these are collected into a list of dictionaries.\n",
    "#\n",
    "print(type(output))\n",
    "print(\"\\n\")\n",
    "\n",
    "for line in output:\n",
    "  print(line)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Working with the results in Python </h2>\n",
    "+ The collect() action is good for validating sample output, but don't use it on big data because all the results must fit in memory.\n",
    "+ The following shows how the data that is passed back is formatted as a Python list of dictionary objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score:  0  Magnitude : 0\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  -0.4  Magnitude : 0.4\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  -0.6  Magnitude : 0.6\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  -0.1  Magnitude : 0.1\n",
      "Score:  -0.1  Magnitude : 0.1\n",
      "Score:  0.5  Magnitude : 0.5\n",
      "Score:  0.4  Magnitude : 0.4\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  -0.1  Magnitude : 0.1\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  0  Magnitude : 0\n",
      "Score:  -0.6  Magnitude : 0.6\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  0.1  Magnitude : 0.1\n",
      "Score:  0.2  Magnitude : 0.2\n",
      "Score:  0  Magnitude : 0\n"
     ]
    }
   ],
   "source": [
    "# \n",
    "# Ouput is a list of dictionaries\n",
    "# When the list is iterated, each line is one dictionary \n",
    "# And the dictionary is double-subscripted\n",
    "#\n",
    "for line in output:\n",
    "  print(\"Score: \",line['documentSentiment']['score'], \" Magnitude :\",line['documentSentiment']['magnitude'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Using another feature of the Natural Language API </h2>\n",
    "+ In this code block there is an analyze Entities version of the Language Service\n",
    "+ Analysis of entities identifies and recognizes items in text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Tailored Analysis function defined.\n"
     ]
    }
   ],
   "source": [
    "def TailoredAnalysis(text):\n",
    "    from googleapiclient.discovery import build\n",
    "    lservice = build('language', 'v1beta1', developerKey=APIKEY)\n",
    "\n",
    "    response = lservice.documents().analyzeEntities(\n",
    "        body={\n",
    "            'document': {\n",
    "                'type': 'PLAIN_TEXT',\n",
    "                'content': text\n",
    "            }\n",
    "        }).execute()\n",
    "\n",
    "    return response\n",
    "  \n",
    "print(\"\\n\",\"Tailored Analysis function defined.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Working with the results in Spark </h2>\n",
    "+ Working with results in Python can be useful. However, if the results are too big to fit in memory, you will want to perform transformations\n",
    "+ on the data while still in RDDs using Spark. In this section you will explore transforming the results of the TailoredAnalysis function in Spark.\n",
    "\n",
    "<h3> Load some sample files into Cloud Storage </h3>\n",
    "+ Step 1: Locate some samle text such as a news article in your browser. Copy the text into a file called sample.txt\n",
    "+ Step 2: SSH to the Master Node\n",
    "+ Step 3: Upload the file to Cloud Storage\n",
    " - gcloud storage cp sample.txt gs://[your-bucket]\n",    "\n",
    "+ Step 4: Download some more prepared sample files: \n",
    "  - curl https://storage.googleapis.com/cloud-training/archdp/sherlock2.txt > sherlock2.txt\n",
    "  - curl https://storage.googleapis.com/cloud-training/archdp/sherlock3.txt > sherlock3.txt\n",
    "  - curl https://storage.googleapis.com/cloud-training/archdp/sherlock4.txt > sherlock4.txt\n",
    "+ Step 5: Upload the files to your Cloud Storage:  \n",
    " - gcloud storage cp sample* gs://[your-bucket]\n",    "\n",
    "<h3> Run the analysis and variations </h3>\n",
    "+ This code recognizes people and locations mentioned in the document, and returns a list of them with the number of mentions.\n",
    "+\n",
    "+ Step 6: In the following code block, replace the bucket name [your-bucket] with your bucket\n",
    "+ After you run the block and see the results, change the word 'PERSON' to 'LOCATION' and run it again\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(u'Cadger', 1), (u'Eight', 1), (u'Hettie Potter', 1), (u'Homer', 1), (u'Nebuchadnezzar', 1), (u'Neither', 1), (u'Phoenician', 1), (u'Psychologist', 23), (u'Simon Newcomb', 1), (u'Watchett', 1), (u'conductors', 1), (u'couple', 1), (u'crowd', 1), (u'driver', 1), (u'eddy', 1), (u'friend', 1), (u'group', 2), (u'historian', 1), (u'host', 1), (u'mathematicians', 1), (u'noun substantives', 1), (u'rest', 1), (u'schoolmaster', 1), (u'some', 1), (u'speaker', 1)]\n"
     ]
    }
   ],
   "source": [
    "\n",
    "# [STEP 1] HDFS \n",
    "#lines = sc.textFile(\"/sampledata/road-not-taken.txt\")\n",
    "#\n",
    "#\n",
    "# [STEP 2] Cloud Storage \n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P1.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P2.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P3.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P4.txt\")\n",
    "\n",
    "lines = sc.textFile(\"gs://qwiklabs-gcp-14067121d7b1d12c/time-machine-P1.txt\")\n",
    "#\n",
    "#\n",
    "#\n",
    "\n",
    "entities = lines.map(TailoredAnalysis)\n",
    "\n",
    "from operator import add\n",
    "rdd = entities.map(lambda x: x['entities'])\n",
    "#\n",
    "# results = rdd.flatMap(lambda x: x ).filter(lambda x: x['type']==u'PERSON').map(lambda x:(x['name'],1)).reduceByKey(add)\n",
    "#\n",
    "# It is common practice to use line continuation \"\\\" to help make the chain more readable\n",
    "results  = rdd.flatMap(lambda x: x )\\\n",
    "              .filter(lambda x: x['type']==u'PERSON')\\\n",
    "              .map(lambda x:(x['name'],1))\\\n",
    "              .reduceByKey(add)\n",
    "              \n",
    "\n",
    "print(sorted(results.take(25)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(u'I', 1), (u'London', 3), (u'Oriental', 1), (u'State', 1), (u'Thames valley', 1), (u'beach', 1), (u'buildings', 4), (u'cemeteries', 1), (u'dining-halls', 1), (u'everywhere', 3), (u'hill crest', 1), (u'hill slopes', 1), (u'neighbourhood', 1), (u'palace', 2), (u'planets', 1), (u'plants', 3), (u'railways', 1), (u'ruin', 1), (u'shop', 1), (u'sky', 5), (u'slope', 3), (u'somewhere', 1), (u'state', 1), (u'wells', 7), (u'workrooms', 1)]\n"
     ]
    }
   ],
   "source": [
    "# [STEP 3] Cloud Storage \n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P1.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P2.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P3.txt\")\n",
    "#lines = sc.textFile(\"gs://<your-bucket>/time-machine-P4.txt\")\n",
    "#\n",
    "lines = sc.textFile(\"gs://qwiklabs-gcp-14067121d7b1d12c/time-machine-P2.txt\")\n",
    "#\n",
    "\n",
    "entities = lines.map(TailoredAnalysis)\n",
    "\n",
    "from operator import add\n",
    "rdd = entities.map(lambda x: x['entities'])\n",
    "#\n",
    "# results = rdd.flatMap(lambda x: x ).filter(lambda x: x['type']==u'PERSON').map(lambda x:(x['name'],1)).reduceByKey(add)\n",
    "#\n",
    "# It is common practice to use line continuation \"\\\" to help make the chain more readable\n",
    "results  = rdd.flatMap(lambda x: x )\\\n",
    "              .filter(lambda x: x['type']==u'LOCATION')\\\n",
    "              .map(lambda x:(x['name'],1))\\\n",
    "              .reduceByKey(add)\n",
    "              \n",
    "\n",
    "print(sorted(results.take(25)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Save as text to Cloud Storage </h2>\n",
    "<h3>Write file to Cloud Storage</h3>\n",
    "+ Replace the bucket in the example with your bucket.\n",
    "+ Run the next block to save the RDD to cloud storage. \n",
    "+ repartition(1) reorganizes the RDD internally into a single partition.\n",
    "+ saveAsTextFile saved the partition in a folder called sampleoutput. \n",
    "\n",
    "<h3>View Cloud Storage in Console</h3>\n",
    "+ In Console, go to the Cloud Storage Browser, locate the sampleoutput folder, and look inside.\n",
    "+ Inside the folder you will find part-xxxxx coresponding to the partitions in the RDD.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "          <script src=\"/static/components/requirejs/require.js\"></script>\n",
       "          <script>\n",
       "            requirejs.config({\n",
       "              paths: {\n",
       "                base: '/static/base',\n",
       "              },\n",
       "            });\n",
       "          </script>\n",
       "          "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output to Cloud Storage is complete.\n"
     ]
    }
   ],
   "source": [
    "# Replace with your bucket\n",
    "#\n",
    "results.repartition(1).saveAsTextFile(\"gs://qwiklabs-gcp-14067121d7b1d12c/sampleoutput/\")\n",
    "\n",
    "print(\"Output to Cloud Storage is complete.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Natural Language API </h2>\n",
    "+ Now you see the power of mixing NLP API and Big Data\n",
    "+ It seems almost like a magic trick that a computer program can \"read\" an article and understand \"who the people\" are and \"where things occurred\".\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
